Skip to content

feat: Plumb Parquet virtual columns (row_number) through TableSchema and ParquetOpener#22026

Open
mbutrovich wants to merge 17 commits into
apache:mainfrom
mbutrovich:virtual-columns-table-schema
Open

feat: Plumb Parquet virtual columns (row_number) through TableSchema and ParquetOpener#22026
mbutrovich wants to merge 17 commits into
apache:mainfrom
mbutrovich:virtual-columns-table-schema

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich commented May 5, 2026

Which issue does this PR close?

Rationale for this change

arrow-rs 57.1.0+ supports Parquet virtual columns (row_number, row_group_index) via ArrowReaderOptions::with_virtual_columns, and DataFusion pins a new-enough arrow-rs for the API to be available. DataFusion does not yet plumb the option through ParquetOpener, so consumers (notably Comet) cannot project Spark's _tmp_metadata_row_index through the native_datafusion scan path.

This PR adds the minimal opener-boundary plumbing so TableSchema can carry virtual columns and the Parquet reader produces them. UX / SQL-layer surface for virtual columns stays deferred to the epic in #20135 — this follows the same framing alamb blessed for #20071 (the input_file_name() UDF).

What changes are included in this PR?

  • TableSchema::with_virtual_columns(...) builder + virtual_columns() getter. Layout: [file, partition, virtual]. Composable with with_table_partition_cols in either order.
  • TableSchema::schema_without_virtual_columns() — file + partition schema used by pushdown-planning paths that can't evaluate virtual-col refs.
  • ParquetOpener forwards the fields to ArrowReaderOptions::with_virtual_columns; augments the schemas passed to the expr-adapter / simplifier with virtual fields so virtual-col refs identity-rewrite; strips them from the projection fed to ProjectionMask::roots (which only understands file columns) and appends them to stream_schema so reassign_expr_columns resolves them by name.
  • New ParquetVirtualColumn enum with TryFrom<&FieldRef> (in datasource-parquet::virtual_column) gates which arrow-rs virtual extension types are accepted. Currently only RowNumber; adding a variant (e.g. RowGroupIndex) is a compile-time obligation. Replaces the earlier runtime string-allowlist so the contract lives in the type system.
  • ParquetSource::try_pushdown_filters classifies filters against the file+partition schema (not the full table schema) so predicates referencing virtual columns are reported as PushedDown::No and the FilterExec stays above the scan — arrow-rs's RowFilter addresses parquet leaves only and can't evaluate virtual-column refs, so silently pushing them would produce wrong results.
  • Defensive check in the opener: build_virtual_columns_state (run once per scan partition at morselizer-build time) errors when pushdown_filters=true and the predicate references a virtual column, with a clear remediation message pointing at try_pushdown_filters. This catches callers that bypass the optimizer and set the predicate on ParquetSource directly.
  • arrow-schema added as a direct dep (previously transitive via arrow) so the enum references RowNumber::NAME from arrow-rs instead of hardcoding the string.
  • Explicitly not in scope (follow-ups): ListingTable / SQL-layer surface, a three-arg constructor on TableSchema, ParquetSource::with_virtual_columns, and RowGroupIndex support.

Are these changes tested?

Yes. New unit tests in opener.rs:

  • test_row_index_basic — single row group, select data + row_number.
  • test_row_index_projection_only — select only row_number.
  • test_row_index_multi_row_group — 3 × 100 rows, verify absolute 0..300 across boundaries.
  • test_row_index_with_row_group_skip — predicate stats-prunes the middle row group; verify row numbers stay absolute (0..100 ++ 200..300). Critical correctness gate for Spark (and for Fix RowNumberReader when not all row groups are selected arrow-rs#8863).
  • test_row_index_with_partition_cols — partition + virtual + data columns compose correctly.
  • test_row_index_nullable_int64 — nullability flag flows through unchanged (matches Spark's _tmp_metadata_row_index declaration).
  • test_unsupported_virtual_extension_type_rejected — using RowGroupIndex (a real arrow-rs type deliberately not in the enum yet) errors with NotImplemented instead of silently forwarding.
  • test_row_index_predicate_pushdown_mixed_or_errors / _virtual_only_errors / _allowed_when_pushdown_disabled — exercise the opener's defensive check for virtual-col predicate refs with pushdown_filters=true, and confirm the pushdown_filters=false path is unaffected.

In source.rs: test_try_pushdown_filters_rejects_virtual_column_refs pins the planner-boundary contract — file-col filters are PushedDown::Yes, virtual-only and mixed filters are PushedDown::No.

In virtual_column.rs: unit tests covering TryFrom<&FieldRef> for valid, missing-extension-type, and unsupported-extension-type inputs.

Plus a TableSchema unit test verifying the [file, partition, virtual] layout is stable regardless of builder-call order.

Are there any user-facing changes?

Public API additions: TableSchema::with_virtual_columns(...), TableSchema::virtual_columns(), TableSchema::schema_without_virtual_columns(), and ParquetVirtualColumn (re-exported from datafusion-datasource-parquet). No existing API changed; no breaking changes.

mbutrovich added 2 commits May 5, 2026 13:21
…and ParquetOpener, gated behind a tested-only extension-type allowlist, to unblock Comet's native-DataFusion support for Spark's _tmp_metadata_row_index.
@github-actions github-actions Bot added the datasource Changes to the datasource crate label May 5, 2026
Comment thread datafusion/datasource-parquet/src/opener.rs
Comment thread datafusion/datasource/src/table_schema.rs
Comment thread datafusion/datasource/src/table_schema.rs
@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented May 5, 2026

My main concern is #22026 (comment).

The various schemas in opener.rs are already quite complex, this risks making it worse.

@mbutrovich
Copy link
Copy Markdown
Contributor Author

My main concern is #22026 (comment).

The various schemas in opener.rs are already quite complex, this risks making it worse.

Thanks for the review @adriangb! Agreed it could make things more complicated, but if DataFusion is ever going to support these virtual columns it might be unavoidable. I think it's good to hash this stuff out in the smallest possible PR at the opener level. I'll push an update later today.

@mbutrovich
Copy link
Copy Markdown
Contributor Author

Thanks again for the review @adriangb! Hopefully I addressed all of the feedback, but happy to keep chatting about it.

Mixed virtual/file predicates with pushdown_filters=true

Confirmed the silent-drop bug with failing tests. Root cause: ParquetSource::try_pushdown_filters called can_expr_be_pushed_down_with_schemas with the full table schema (now including virtual columns), so filters referencing row_number were marked PushedDown::YesFilterExec removed → the scan's build_row_filter couldn't resolve the virtual-col ref against physical_file_schema and silently dropped the conjunct.

Arrow-rs can't accept virtual-column refs in a RowFilter at all: ArrowPredicate::projection() returns a ProjectionMask over parquet leaves only, and virtual columns are synthesized after filter evaluation. So virtual columns are projectable but never pushable.

Fix: added TableSchema::schema_without_virtual_columns() (file + partition, excluding virtual) and try_pushdown_filters uses that. Virtual-col filters are now reported PushedDown::No and the FilterExec stays above the scan.

Defense-in-depth in the opener for callers who bypass the optimizer (e.g. manual plan builders): prepare_open_file rejects pushdown_filters=true + virtual-col predicate with a clear error pointing at with_pushdown_filters(false) or keeping the filter above the scan.

Tests: source.rs::test_try_pushdown_filters_rejects_virtual_column_refs (planner boundary), plus three opener-level tests covering mixed OR, virtual-only, and the allowed pushdown_filters=false case.

Ordering doc on virtual_columns

Struct field doc now spells out the [file, partition, virtual] layout, matching the builder methods.

Enum + TryFrom

Added ParquetVirtualColumn with TryFrom<&FieldRef> in a new virtual_column.rs. The runtime allowlist in the opener is replaced with ParquetVirtualColumn::try_from(field)?. Adding a new variant (e.g. RowGroupIndex) is now a compile-time obligation, and consumers can pattern-match instead of string-comparing extension-type names. Exposed as pub use ParquetVirtualColumn at the crate root.

@mbutrovich mbutrovich requested a review from adriangb May 5, 2026 20:52
@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented May 5, 2026

I think this would then have a negative interaction with the goal of turning filter pushdown on by default. Maybe we'll always have to apply some filters as a FilterExec and that's fine...

@mbutrovich
Copy link
Copy Markdown
Contributor Author

mbutrovich commented May 5, 2026

I think this would then have a negative interaction with the goal of turning filter pushdown on by default. Maybe we'll always have to apply some filters as a FilterExec and that's fine...

Comet conservatively never removes FilterExec nodes above scans with pushed down filters, though that maybe shouldn't be the case.

Wouldn't this only prevent filter pushdown for filters that reference virtual columns?

@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented May 5, 2026

Wouldn't this only prevent filter pushdown for filters that reference virtual columns?

Yeah but it means we'll have to keep the split forever. Which might have been the case anyway and maybe a non issue.

And that any filter that does reference virtual columns cannot be pushed down even if a part of it would benefit from doing so, e..g row_id = 1 and pk = 1, but I'm not sure that's a realistic scenario. In the past we prevented pushdown of projection columns and that was a real issue, we'd see queries in prod from users along the lines of day = '...' OR pk = 1 that could not get pushed down.

@adriangb
Copy link
Copy Markdown
Contributor

adriangb commented May 5, 2026

I plan to give this another review tomorrow.

@comphead
Copy link
Copy Markdown
Contributor

comphead commented May 5, 2026

run benchmark tpch tpcds

@comphead
Copy link
Copy Markdown
Contributor

comphead commented May 5, 2026

@mbutrovich from high level perspective how row_number virtual column would work when reading multiple parquet files?

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4383929017-2034-5dnfv 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing virtual-columns-table-schema (bd513ec) to 2c7af17 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4383929017-2033-f8cjt 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing virtual-columns-table-schema (bd513ec) to 2c7af17 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and virtual-columns-table-schema
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃   virtual-columns-table-schema ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 40.03 / 41.45 ±1.55 / 43.79 ms │ 39.49 / 40.62 ±1.19 / 42.39 ms │ no change │
│ QQuery 2  │ 21.07 / 21.56 ±0.69 / 22.87 ms │ 20.75 / 20.84 ±0.10 / 21.04 ms │ no change │
│ QQuery 3  │ 35.68 / 38.24 ±1.33 / 39.30 ms │ 35.44 / 37.61 ±1.64 / 39.23 ms │ no change │
│ QQuery 4  │ 18.04 / 18.37 ±0.17 / 18.52 ms │ 18.06 / 18.12 ±0.05 / 18.19 ms │ no change │
│ QQuery 5  │ 43.56 / 45.18 ±2.01 / 48.95 ms │ 43.18 / 44.20 ±0.87 / 45.76 ms │ no change │
│ QQuery 6  │ 17.06 / 17.19 ±0.14 / 17.45 ms │ 17.06 / 17.16 ±0.08 / 17.28 ms │ no change │
│ QQuery 7  │ 49.86 / 50.64 ±0.54 / 51.27 ms │ 50.04 / 52.14 ±2.32 / 56.63 ms │ no change │
│ QQuery 8  │ 46.49 / 46.74 ±0.14 / 46.88 ms │ 46.50 / 46.95 ±0.65 / 48.22 ms │ no change │
│ QQuery 9  │ 51.78 / 52.17 ±0.28 / 52.54 ms │ 51.72 / 52.33 ±0.52 / 53.01 ms │ no change │
│ QQuery 10 │ 65.29 / 65.42 ±0.11 / 65.57 ms │ 65.11 / 65.91 ±1.20 / 68.29 ms │ no change │
│ QQuery 11 │ 13.62 / 14.10 ±0.63 / 15.35 ms │ 13.68 / 14.39 ±1.31 / 17.00 ms │ no change │
│ QQuery 12 │ 26.16 / 26.42 ±0.24 / 26.78 ms │ 26.36 / 26.73 ±0.28 / 27.10 ms │ no change │
│ QQuery 13 │ 35.63 / 36.37 ±0.51 / 36.97 ms │ 35.10 / 36.02 ±0.71 / 36.92 ms │ no change │
│ QQuery 14 │ 26.54 / 27.04 ±0.62 / 28.24 ms │ 26.64 / 26.83 ±0.15 / 27.07 ms │ no change │
│ QQuery 15 │ 32.68 / 32.81 ±0.10 / 32.95 ms │ 32.57 / 33.23 ±0.62 / 34.39 ms │ no change │
│ QQuery 16 │ 15.17 / 15.27 ±0.06 / 15.36 ms │ 15.10 / 15.24 ±0.11 / 15.42 ms │ no change │
│ QQuery 17 │ 75.04 / 76.49 ±0.95 / 77.33 ms │ 75.97 / 77.19 ±1.14 / 79.00 ms │ no change │
│ QQuery 18 │ 67.84 / 68.82 ±0.96 / 70.42 ms │ 67.31 / 68.81 ±0.94 / 69.99 ms │ no change │
│ QQuery 19 │ 37.52 / 37.65 ±0.13 / 37.90 ms │ 37.42 / 37.70 ±0.22 / 38.08 ms │ no change │
│ QQuery 20 │ 38.52 / 38.72 ±0.15 / 38.88 ms │ 38.62 / 39.10 ±0.33 / 39.53 ms │ no change │
│ QQuery 21 │ 58.33 / 59.44 ±0.83 / 60.37 ms │ 59.62 / 60.74 ±0.71 / 61.68 ms │ no change │
│ QQuery 22 │ 23.78 / 23.97 ±0.18 / 24.28 ms │ 23.64 / 24.06 ±0.42 / 24.80 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                           ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 854.08ms │
│ Total Time (virtual-columns-table-schema)   │ 855.91ms │
│ Average Time (HEAD)                         │  38.82ms │
│ Average Time (virtual-columns-table-schema) │  38.90ms │
│ Queries Faster                              │        0 │
│ Queries Slower                              │        0 │
│ Queries with No Change                      │       22 │
│ Queries with Failure                        │        0 │
└─────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.5 GiB
Avg memory 5.0 GiB
CPU user 32.0s
CPU sys 2.2s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 5.5 GiB
Avg memory 5.0 GiB
CPU user 31.9s
CPU sys 2.3s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and virtual-columns-table-schema
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃          virtual-columns-table-schema ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           6.47 / 7.04 ±0.91 / 8.85 ms │           6.37 / 6.84 ±0.83 / 8.50 ms │     no change │
│ QQuery 2  │        82.70 / 83.63 ±0.58 / 84.33 ms │        83.93 / 84.87 ±0.51 / 85.28 ms │     no change │
│ QQuery 3  │        31.32 / 31.64 ±0.18 / 31.82 ms │        31.09 / 31.27 ±0.15 / 31.53 ms │     no change │
│ QQuery 4  │    563.85 / 580.17 ±13.51 / 595.57 ms │     573.63 / 590.61 ±9.15 / 599.40 ms │     no change │
│ QQuery 5  │        55.02 / 56.17 ±1.00 / 57.36 ms │        55.18 / 55.96 ±0.51 / 56.58 ms │     no change │
│ QQuery 6  │        38.16 / 38.73 ±0.55 / 39.46 ms │        38.26 / 39.97 ±2.02 / 43.82 ms │     no change │
│ QQuery 7  │     116.31 / 118.29 ±1.77 / 121.63 ms │     115.76 / 117.67 ±1.98 / 121.39 ms │     no change │
│ QQuery 8  │        41.49 / 41.70 ±0.24 / 42.13 ms │        40.89 / 40.96 ±0.07 / 41.06 ms │     no change │
│ QQuery 9  │        55.77 / 59.88 ±2.62 / 63.46 ms │        54.56 / 57.98 ±2.05 / 60.75 ms │     no change │
│ QQuery 10 │        86.67 / 87.58 ±0.77 / 88.54 ms │        85.72 / 86.23 ±0.52 / 87.06 ms │     no change │
│ QQuery 11 │    351.22 / 364.88 ±11.43 / 377.34 ms │     363.72 / 370.19 ±3.52 / 373.64 ms │     no change │
│ QQuery 12 │        30.78 / 31.07 ±0.18 / 31.24 ms │        30.51 / 30.80 ±0.26 / 31.28 ms │     no change │
│ QQuery 13 │     138.03 / 138.75 ±0.79 / 140.19 ms │     134.83 / 136.57 ±1.54 / 138.86 ms │     no change │
│ QQuery 14 │     527.83 / 534.55 ±3.64 / 538.71 ms │     532.98 / 536.60 ±1.87 / 538.43 ms │     no change │
│ QQuery 15 │        63.93 / 65.02 ±1.28 / 67.43 ms │        66.70 / 69.00 ±1.77 / 71.51 ms │  1.06x slower │
│ QQuery 16 │           7.18 / 7.40 ±0.23 / 7.83 ms │           7.29 / 7.43 ±0.11 / 7.57 ms │     no change │
│ QQuery 17 │        86.73 / 88.11 ±1.57 / 90.98 ms │        85.31 / 87.53 ±2.44 / 92.00 ms │     no change │
│ QQuery 18 │     163.53 / 164.50 ±0.72 / 165.59 ms │     159.80 / 163.91 ±2.25 / 166.47 ms │     no change │
│ QQuery 19 │        44.35 / 44.67 ±0.26 / 45.13 ms │        44.77 / 45.03 ±0.39 / 45.80 ms │     no change │
│ QQuery 20 │        37.61 / 38.23 ±0.45 / 38.98 ms │        37.94 / 38.58 ±0.43 / 39.23 ms │     no change │
│ QQuery 21 │        19.01 / 19.36 ±0.20 / 19.59 ms │        19.36 / 19.60 ±0.18 / 19.88 ms │     no change │
│ QQuery 22 │        64.17 / 65.22 ±0.94 / 66.71 ms │        68.63 / 69.51 ±0.58 / 70.44 ms │  1.07x slower │
│ QQuery 23 │    504.82 / 524.83 ±20.74 / 562.39 ms │     510.61 / 522.29 ±9.80 / 533.95 ms │     no change │
│ QQuery 24 │     249.60 / 252.27 ±2.50 / 255.57 ms │     250.70 / 259.29 ±7.06 / 271.39 ms │     no change │
│ QQuery 25 │     120.57 / 122.44 ±1.79 / 125.60 ms │     121.80 / 123.77 ±1.59 / 126.18 ms │     no change │
│ QQuery 26 │        76.56 / 77.43 ±0.76 / 78.68 ms │        76.28 / 77.69 ±0.90 / 78.91 ms │     no change │
│ QQuery 27 │           7.11 / 7.26 ±0.14 / 7.53 ms │           7.38 / 7.65 ±0.15 / 7.77 ms │  1.05x slower │
│ QQuery 28 │        65.63 / 67.13 ±0.81 / 68.00 ms │        65.96 / 67.41 ±0.74 / 67.93 ms │     no change │
│ QQuery 29 │     105.15 / 107.19 ±1.27 / 109.13 ms │     106.33 / 107.90 ±2.12 / 111.97 ms │     no change │
│ QQuery 30 │                                  FAIL │                                  FAIL │  incomparable │
│ QQuery 31 │     117.44 / 118.81 ±1.06 / 120.34 ms │     117.93 / 120.44 ±1.44 / 121.93 ms │     no change │
│ QQuery 32 │        22.60 / 22.92 ±0.18 / 23.14 ms │        22.60 / 22.97 ±0.23 / 23.24 ms │     no change │
│ QQuery 33 │        42.06 / 42.87 ±0.61 / 43.74 ms │        41.40 / 42.57 ±1.83 / 46.21 ms │     no change │
│ QQuery 34 │        10.86 / 11.45 ±0.43 / 12.08 ms │        10.67 / 11.08 ±0.33 / 11.53 ms │     no change │
│ QQuery 35 │        85.94 / 87.24 ±1.76 / 90.64 ms │        85.23 / 85.60 ±0.34 / 86.16 ms │     no change │
│ QQuery 36 │           6.91 / 7.06 ±0.10 / 7.21 ms │           6.55 / 6.71 ±0.13 / 6.92 ms │     no change │
│ QQuery 37 │           7.69 / 7.82 ±0.09 / 7.93 ms │           7.52 / 7.76 ±0.16 / 7.98 ms │     no change │
│ QQuery 38 │        73.83 / 74.10 ±0.29 / 74.63 ms │        76.04 / 76.73 ±0.57 / 77.56 ms │     no change │
│ QQuery 39 │     105.55 / 107.98 ±2.12 / 110.73 ms │     109.68 / 111.93 ±1.47 / 114.04 ms │     no change │
│ QQuery 40 │        24.25 / 24.44 ±0.10 / 24.51 ms │        24.98 / 25.22 ±0.19 / 25.57 ms │     no change │
│ QQuery 41 │        14.39 / 14.59 ±0.13 / 14.77 ms │        15.22 / 15.33 ±0.07 / 15.42 ms │  1.05x slower │
│ QQuery 42 │        25.59 / 26.12 ±0.36 / 26.66 ms │        26.28 / 26.66 ±0.33 / 27.10 ms │     no change │
│ QQuery 43 │           5.65 / 5.76 ±0.10 / 5.89 ms │           5.84 / 6.56 ±0.91 / 8.35 ms │  1.14x slower │
│ QQuery 44 │        11.66 / 11.80 ±0.08 / 11.91 ms │        11.75 / 12.08 ±0.25 / 12.51 ms │     no change │
│ QQuery 45 │        45.21 / 47.41 ±1.80 / 49.02 ms │        47.82 / 48.61 ±1.29 / 51.19 ms │     no change │
│ QQuery 46 │        14.16 / 14.51 ±0.27 / 14.87 ms │        14.85 / 15.15 ±0.23 / 15.47 ms │     no change │
│ QQuery 47 │     252.56 / 265.14 ±7.41 / 275.21 ms │     250.20 / 253.65 ±2.99 / 258.15 ms │     no change │
│ QQuery 48 │     109.27 / 110.30 ±1.01 / 112.03 ms │     109.47 / 110.67 ±1.38 / 113.27 ms │     no change │
│ QQuery 49 │        85.89 / 86.30 ±0.24 / 86.62 ms │        86.03 / 87.00 ±0.62 / 87.98 ms │     no change │
│ QQuery 50 │        63.08 / 64.30 ±1.68 / 67.59 ms │        63.11 / 65.72 ±2.31 / 69.81 ms │     no change │
│ QQuery 51 │       93.81 / 97.35 ±2.10 / 100.26 ms │       96.59 / 98.01 ±1.29 / 100.06 ms │     no change │
│ QQuery 52 │        26.20 / 27.15 ±1.01 / 29.08 ms │        25.82 / 26.11 ±0.25 / 26.41 ms │     no change │
│ QQuery 53 │        32.39 / 32.49 ±0.08 / 32.62 ms │        32.17 / 33.24 ±1.49 / 36.18 ms │     no change │
│ QQuery 54 │        57.61 / 58.25 ±0.51 / 59.05 ms │        56.43 / 58.52 ±2.13 / 62.45 ms │     no change │
│ QQuery 55 │        25.19 / 25.68 ±0.51 / 26.65 ms │        25.73 / 26.27 ±0.31 / 26.66 ms │     no change │
│ QQuery 56 │        41.62 / 42.07 ±0.57 / 43.19 ms │        42.96 / 43.28 ±0.24 / 43.64 ms │     no change │
│ QQuery 57 │     187.59 / 191.07 ±2.00 / 193.20 ms │     191.85 / 193.30 ±1.38 / 195.41 ms │     no change │
│ QQuery 58 │     123.84 / 124.66 ±0.44 / 125.07 ms │     120.61 / 123.17 ±1.51 / 124.88 ms │     no change │
│ QQuery 59 │     121.67 / 122.20 ±0.56 / 122.97 ms │     120.57 / 121.92 ±0.88 / 123.10 ms │     no change │
│ QQuery 60 │        41.96 / 42.50 ±0.39 / 43.13 ms │        42.25 / 42.78 ±0.38 / 43.32 ms │     no change │
│ QQuery 61 │        14.24 / 14.30 ±0.07 / 14.43 ms │        14.44 / 14.53 ±0.07 / 14.64 ms │     no change │
│ QQuery 62 │        49.34 / 49.86 ±0.29 / 50.24 ms │        48.86 / 49.80 ±1.52 / 52.82 ms │     no change │
│ QQuery 63 │        32.72 / 33.05 ±0.19 / 33.27 ms │        32.19 / 32.43 ±0.28 / 32.97 ms │     no change │
│ QQuery 64 │     495.24 / 501.59 ±6.70 / 513.86 ms │     492.56 / 497.63 ±3.75 / 502.42 ms │     no change │
│ QQuery 65 │     149.29 / 152.59 ±2.31 / 155.63 ms │     153.14 / 156.85 ±2.60 / 161.08 ms │     no change │
│ QQuery 66 │        86.71 / 88.91 ±1.30 / 90.44 ms │        86.27 / 90.26 ±4.05 / 98.06 ms │     no change │
│ QQuery 67 │     262.50 / 269.09 ±4.74 / 274.49 ms │     266.01 / 272.73 ±4.14 / 278.81 ms │     no change │
│ QQuery 68 │        14.25 / 14.64 ±0.23 / 14.85 ms │        14.85 / 15.03 ±0.21 / 15.38 ms │     no change │
│ QQuery 69 │        81.94 / 84.11 ±2.12 / 88.00 ms │        82.21 / 85.06 ±5.13 / 95.32 ms │     no change │
│ QQuery 70 │     110.46 / 112.49 ±2.02 / 116.35 ms │     109.60 / 115.95 ±6.54 / 124.14 ms │     no change │
│ QQuery 71 │        38.30 / 39.55 ±1.99 / 43.46 ms │        37.36 / 37.54 ±0.15 / 37.77 ms │ +1.05x faster │
│ QQuery 72 │ 2175.03 / 2325.52 ±88.51 / 2444.48 ms │ 2314.95 / 2373.68 ±38.31 / 2425.89 ms │     no change │
│ QQuery 73 │        10.79 / 11.10 ±0.29 / 11.51 ms │        10.45 / 10.62 ±0.12 / 10.76 ms │     no change │
│ QQuery 74 │     206.09 / 208.67 ±1.49 / 210.17 ms │     195.10 / 200.32 ±6.41 / 211.59 ms │     no change │
│ QQuery 75 │     155.97 / 158.32 ±1.80 / 160.67 ms │     156.02 / 158.77 ±1.86 / 161.77 ms │     no change │
│ QQuery 76 │        37.66 / 38.75 ±1.68 / 42.04 ms │        37.89 / 38.50 ±0.47 / 39.26 ms │     no change │
│ QQuery 77 │        64.99 / 66.20 ±0.67 / 66.91 ms │        64.74 / 65.94 ±0.70 / 66.89 ms │     no change │
│ QQuery 78 │     202.83 / 206.45 ±3.27 / 210.58 ms │     201.87 / 206.98 ±4.10 / 210.88 ms │     no change │
│ QQuery 79 │        69.64 / 71.02 ±1.23 / 72.96 ms │        71.07 / 71.50 ±0.39 / 72.21 ms │     no change │
│ QQuery 80 │     106.92 / 109.00 ±2.04 / 112.87 ms │     106.83 / 108.13 ±1.09 / 109.68 ms │     no change │
│ QQuery 81 │        26.49 / 27.59 ±1.67 / 30.86 ms │        26.37 / 26.78 ±0.23 / 27.03 ms │     no change │
│ QQuery 82 │        18.25 / 18.61 ±0.21 / 18.87 ms │        18.61 / 18.73 ±0.10 / 18.91 ms │     no change │
│ QQuery 83 │        39.97 / 40.40 ±0.29 / 40.88 ms │        40.16 / 41.17 ±1.42 / 43.96 ms │     no change │
│ QQuery 84 │        45.46 / 46.40 ±1.58 / 49.54 ms │        45.58 / 45.84 ±0.33 / 46.49 ms │     no change │
│ QQuery 85 │     145.11 / 146.33 ±1.23 / 148.46 ms │     144.07 / 144.94 ±0.48 / 145.39 ms │     no change │
│ QQuery 86 │        27.17 / 27.58 ±0.27 / 27.96 ms │        26.17 / 26.48 ±0.26 / 26.84 ms │     no change │
│ QQuery 87 │        72.71 / 74.93 ±1.56 / 76.67 ms │        71.79 / 72.42 ±0.39 / 72.87 ms │     no change │
│ QQuery 88 │        66.63 / 67.67 ±1.03 / 69.60 ms │        67.28 / 68.12 ±0.93 / 69.86 ms │     no change │
│ QQuery 89 │        38.56 / 38.95 ±0.33 / 39.55 ms │        38.47 / 39.04 ±0.69 / 40.37 ms │     no change │
│ QQuery 90 │        19.05 / 19.39 ±0.20 / 19.68 ms │        18.98 / 19.13 ±0.09 / 19.22 ms │     no change │
│ QQuery 91 │        55.48 / 56.13 ±0.40 / 56.69 ms │        55.26 / 55.55 ±0.32 / 56.15 ms │     no change │
│ QQuery 92 │        32.86 / 33.09 ±0.13 / 33.24 ms │        31.83 / 33.11 ±1.92 / 36.93 ms │     no change │
│ QQuery 93 │        54.32 / 56.40 ±1.53 / 58.12 ms │        54.62 / 56.82 ±2.18 / 60.32 ms │     no change │
│ QQuery 94 │        42.09 / 42.63 ±0.45 / 43.37 ms │        42.15 / 43.01 ±0.74 / 44.10 ms │     no change │
│ QQuery 95 │        91.09 / 91.95 ±0.72 / 93.02 ms │        92.95 / 93.70 ±0.51 / 94.19 ms │     no change │
│ QQuery 96 │        25.62 / 25.81 ±0.13 / 25.94 ms │        25.27 / 25.68 ±0.31 / 26.18 ms │     no change │
│ QQuery 97 │        48.19 / 49.05 ±0.78 / 50.37 ms │        48.75 / 49.22 ±0.31 / 49.68 ms │     no change │
│ QQuery 98 │        44.16 / 44.72 ±0.39 / 45.16 ms │        44.17 / 45.28 ±0.74 / 46.37 ms │     no change │
│ QQuery 99 │        72.41 / 73.55 ±1.25 / 75.89 ms │        71.23 / 71.83 ±0.39 / 72.45 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 11275.90ms │
│ Total Time (virtual-columns-table-schema)   │ 11351.25ms │
│ Average Time (HEAD)                         │   115.06ms │
│ Average Time (virtual-columns-table-schema) │   115.83ms │
│ Queries Faster                              │          1 │
│ Queries Slower                              │          5 │
│ Queries with No Change                      │         92 │
│ Queries with Failure                        │          1 │
└─────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 60.0s
Peak memory 6.9 GiB
Avg memory 6.2 GiB
CPU user 258.3s
CPU sys 6.7s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 60.0s
Peak memory 6.7 GiB
Avg memory 6.0 GiB
CPU user 261.9s
CPU sys 7.2s
Peak spill 0 B

File an issue against this benchmark runner

@mbutrovich
Copy link
Copy Markdown
Contributor Author

Following up on this: I added debug_assert! guards on TableSchema::with_virtual_columns (rejects a virtual name that matches a file, partition, or already-registered virtual column) and on TableSchema::with_table_partition_cols (rejects a partition name that matches an existing virtual column, so the invariant holds regardless of builder call order). Release builds pay no validation cost, matching the partition-column convention I described above: upstream avoids bad names, core structs stay cheap, and mistakes surface loudly in dev and CI. Four #[should_panic] tests cover the collision shapes (virtual-vs-file, virtual-vs-partition, virtual-vs-virtual, and partition-added-after-colliding-virtual).

File-vs-partition collisions in TableSchema are still unchecked, matching ListingTable::try_new and Arrow's SchemaBuilder::push. Happy to tighten that in a separate PR if we want to revisit the convention more broadly.

/// Virtual columns are appended at the end of the table schema, after any
/// partition columns.
pub fn with_virtual_columns(mut self, virtual_columns: Vec<FieldRef>) -> Self {
debug_assert!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we raise an error if found name collision? SchemaBuilder won't check this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question @niebayes: I addressed this in #22026 (comment). My argument is that we should handle this in #20135 like partition columns. The debug assertion merely expresses the contract in code without a runtime overhead for release builds.

@comphead
Copy link
Copy Markdown
Contributor

comphead commented May 7, 2026

I made hybrid(genAI + manual) initial review

P1. Per-file validation on morselizer-level constants                                                                                                                                                    
   
  opener.rs:546,550 — validate_supported_virtual_columns and validate_predicate_does_not_reference_virtual_columns run inside prepare_open_file (once per file). Both are pure functions of                
  self.table_schema.virtual_columns() and self.predicate, which are fixed for the morselizer's lifetime. In a scan over 10K files this is 10K redundant tree-walks of the predicate and 10K HashSet<&str>
  rebuilds (opener.rs:1440-1441).                                                                                                                                                                          
                  
  Fix: hoist both checks into ParquetMorselizerBuilder::build() (or wherever the morselizer becomes immutable). Per-file cost drops to zero.                                                               
   
  P2. Reallocating null_replacements and helper schemas per file                                                                                                                                           
                  
  - opener.rs:1187-1193 — null_replacements: HashMap<String, ScalarValue> is built per build_stream call. Virtual columns are static across files, so this map could live on the morselizer.               
  - opener.rs:849,851,1244 — three append_fields(...) calls that allocate fresh Schemas per file (logical-for-rewrite, physical-for-rewrite, stream_schema). Could be computed once when virtual cols are
  fixed.                                                                                                                                                                                                   
                  
  These are all cold-path (setup-per-file), so it's minor — but gratis if you refactor P1.                                                                                                                 
                  
  P3. virtual_columns: Vec<FieldRef> cloned into PreparedParquetOpen                                                                                                                                       
                  
  opener.rs:663 — self.table_schema.virtual_columns().clone() clones the Vec per file. TableSchema already stores Arc<Vec<FieldRef>> internally, but the getter returns &Vec<FieldRef>. Either return      
  &Arc<Vec<FieldRef>> or store the Arc in PreparedParquetOpen. Consistent with the pre-existing partition-cols pattern, so not strictly a regression.
                                                                                                                                                                                                           
  P4. schema_without_virtual_columns rebuilds on every try_pushdown_filters call                                                                                                                           
   
  table_schema.rs:275-280 — allocates a fresh SchemaRef each call. It's a pure function of immutable fields; memoize it next to table_schema (same pattern the struct already uses for table_schema).      
  Low-frequency call (planning-time), so low priority.

partitioned_file: PartitionedFile,
) -> Result<PreparedParquetOpen> {
validate_supported_virtual_columns(self.table_schema.virtual_columns())?;
if self.pushdown_filters
Copy link
Copy Markdown
Contributor

@comphead comphead May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if self.pushdown_filters is not set but predicate exists, we dont need validation? Not sure if this expected

@mbutrovich
Copy link
Copy Markdown
Contributor Author

Addressed @comphead's review:

  • P1 + P2 + P3: Introduced VirtualColumnsState (Arc-shared, holds validated fields, null_replacements, and the logical-with-virtual schema). Built once per scan partition in ParquetSource::create_morselizer; stored as Option<Arc<VirtualColumnsState>> on ParquetMorselizer and PreparedParquetOpen.
  • Per-file cost for virtual-column scans drops to one Arc::clone. The two remaining per-file append_fields calls (physical_for_rewrite, stream_schema) depend on per-file coercions/projection mask and can't be cached.
  • P4 skipped: adding OnceLock<SchemaRef> to every TableSchema to save a one-shot Vec iteration on a planning-time path is not a necessary compute-vs-memory trade.
  • opener.rs:547: Call site moved into create_morselizer with an inline comment explaining why predicate validation gates on pushdown_filters (when pushdown is off, the predicate stays above the scan as a FilterExec and resolves virtual columns there).

@mbutrovich mbutrovich requested a review from comphead May 7, 2026 18:53
@comphead
Copy link
Copy Markdown
Contributor

comphead commented May 8, 2026

Thanks @mbutrovich I'm planning to the second round today.

@adriangb please ping if you feel there are any blockers anticipated or discussions needed

/// above the scan; this check is defense-in-depth for callers that build plans
/// manually and set `with_pushdown_filters(true)` alongside a predicate
/// referencing virtual columns.
fn validate_predicate_does_not_reference_virtual_columns(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of erroring, could we reject just these filters during filter pushdown?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adriangb, want to make sure I understand the intent before I refactor, since the two consumer shapes we have land in different places:

  • DataFusion front-end path: planner calls try_pushdown_filters, which already reports virtual-col filters as PushedDown::No and excludes them from source.predicate. The FilterExec stays above the scan, so the opener never sees a virtual-col ref in the first place.
  • Direct-opener consumers (Comet, and anyone else bypassing the optimizer): they construct the predicate and set with_pushdown_filters(true) themselves. No try_pushdown_filters call, no guarantee the predicate was split, no FilterExec above unless they built one.

The opener-level check only matters for the second group. If we silently split and drop virtual-col conjuncts, those callers get wrong results with no signal; the current error tells them the contract and how to fix it.

Is your suggestion:

  • (a) Drop the opener check entirely and rely on the try_pushdown_filters boundary, accepting that direct-opener callers are on their own?
  • (b) Keep the check but split the predicate in the opener so non-virtual conjuncts still participate in row-filtering / pruning, rather than erroring on the whole predicate?
  • Something else?

Happy to go either way, just want to make sure we're not setting a footgun for the Comet shape.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I didn't realize Comet sets the filters directly without going through try_pushdown_filters. My suggestion (which may not be possible) would be that Comet and other similar consumers follow the same contract the rest of DataFusion uses and go through try_pushdown_filters. If needed we could extract some bit of the optimizer rule into public functions to avoid you having to re-implement the logic of placing a FilterExec above for rejected filters, etc, but I also think that should not be that complicated.

Does that make sense or are there reasons why that would not work?

Copy link
Copy Markdown
Contributor Author

@mbutrovich mbutrovich May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdamGS I know you mentioned wanting this feature, does this contract make sense for your use case? Basically, requiring try_pushdown_filters first? I'm afraid of not enforcing the contract in the opener for consumers who bypass try_pushdown_filters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the middle-ground I would like: we enforce the contract in a debug assertion in the opener, so it's not on the critical path of release builds, but developers who are building on top of DataFusion (particularly in dev builds or CI builds with debug assertions enabled) will get an error. Not having that feels like too much of a footgun to me, since we don't have a good way of enforcing the contract that try_pushdown_filters was called first. What do you think @adriangb?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am testing using try_pushdown_filters instead of with_predicate in Comet:

apache/datafusion-comet#4299

Copy link
Copy Markdown
Contributor

@adriangb adriangb May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good plan to me. If we can have try_pushdown_filters handle the rejection gracefully and the opener itself error that sounds like an ideal solution.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll move forward with that refactor. Thanks for talking through it with me!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully should be good for one more look @adriangb.

@mbutrovich mbutrovich requested a review from adriangb May 12, 2026 18:45
Comment thread datafusion/datasource-parquet/src/opener.rs Outdated
@mbutrovich mbutrovich moved this from Todo to In progress in Comet Development May 13, 2026
@mbutrovich mbutrovich requested a review from adriangb May 14, 2026 18:09
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. @mbutrovich do you intend to get this into 54? I think there's something to be said for waiting until 54 goes out at this point so we can do the rest of the work wiring up so we can derisk the design as a whole.

@mbutrovich
Copy link
Copy Markdown
Contributor Author

mbutrovich commented May 14, 2026

I think this looks good. @mbutrovich do you intend to get this into 54? I think there's something to be said for waiting until 54 goes out at this point so we can do the rest of the work wiring up so we can derisk the design as a whole.

I'd like it in 54 if we think the API at this layer is stable, but I see your argument that if the API needs a tweak when we go to hook everything up that we hit API stability challenges. I am okay to defer, but also was not planning to do the work to hook it up to the front-end any time soon, so it becomes an indefinite merge/maybe not completely wired in 55 either.

@adriangb
Copy link
Copy Markdown
Contributor

Gotcha. If you're okay deferring until 54 (which should just be a week or two) I think that'd make me feel more comfortable taking the risk. We don't have feature freezes officially but I think it's a good general approach to take. I asked in #20135 (comment) if anyone can drive the rest of this but I'd say once 54 is out we can merge this regardless. Thanks for working on this it's been quite the effort!

@mbutrovich
Copy link
Copy Markdown
Contributor Author

mbutrovich commented May 14, 2026

No worries. This isn't urgently needed in Comet, it's just on the list of Spark gaps we want to close. Thanks for your help thus far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

6 participants